摘要 :
Abstract The creation of an image from another and from different types of data including text, scene graph, and object layout, is one of the very challenging tasks in computer vision. In addition, capturing images from different ...
展开
Abstract The creation of an image from another and from different types of data including text, scene graph, and object layout, is one of the very challenging tasks in computer vision. In addition, capturing images from different views for generating an object or a product can be exhaustive and expansive to do manually. Now, using deep learning and artificial intelligence techniques, the generation of new images from different type of data has become possible. For that, a significant effort has been devoted recently to develop image generation strategies with a great achievement. To that end, we present in this paper, to the best of the authors’ knowledge, the first comprehensive overview of existing image generation methods. Accordingly, a description of each image generation technique is performed based on the nature of the adopted algorithms, type of data used, and main objective. Moreover, each image generation category is discussed by presenting the proposed approaches. In addition, a presentation of existing image generation datasets is given. The evaluation metrics that are suitable for each image generation category are discussed and a comparison of the performance of existing solutions is provided to better inform the state-of-the-art and identify their limitations and strengths. Lastly, the current challenges that are facing this subject are presented.
收起
摘要 :
Face sketch-photo synthesis can be regarded as an image-to-image translation problem. Although many generative models achieve good translations from sketches to photos, they still have limitations in preserving face identity due t...
展开
Face sketch-photo synthesis can be regarded as an image-to-image translation problem. Although many generative models achieve good translations from sketches to photos, they still have limitations in preserving face identity due to the huge modality gap of the two domains. To this end, we propose an identity-preserved adversarial model (IPAM), which includes an extended U-Net to increase the weight of the original sketch in translation, two discriminators focusing on the real or fake image concatenation of two domains to learn more styles of the target domain, and an identity constraint to request the fakes and the real targets to have zero cosine distance in feature space. We evaluate our method on two face sketch databases with face recognition. The results demonstrate our translation method is superior to the existing methods in maintaining face identity information.
收起
摘要 :
Abstract Sketch-to-photo face generation has recently gained remarkable attention in computer vision and signal processing communities, because the sketches that employ concise lines are easily available and can describe significa...
展开
Abstract Sketch-to-photo face generation has recently gained remarkable attention in computer vision and signal processing communities, because the sketches that employ concise lines are easily available and can describe significant facial attributes conveniently. Most existing sketch-to-photo works fail to maintain geometric structures and improve local details simultaneously, which limits their performance. In this work, we propose a two-stage sketch-to-photo generative adversarial network for face generation. In the first stage, we propose a semantic loss to maintain semantic consistency. In the second stage, we define the similar connected component and propose a color refinement loss to generate fine-grained details. Moreover, we introduce a multi-scale discriminator and design a patch-level local discriminator. We also propose a texture loss to enhance the local fidelity of synthesized images. Experiments show that our proposed method can significantly generate better results while preserving facial attributes than the state-of-the-art methods.
收起
摘要 :
Recent deep image-to-image translation techniques allow fast generation
of face images from freehand sketches. However, existing solutions tend to
overfit to sketches, thus requiring professional sketches or even edge maps
as i...
展开
Recent deep image-to-image translation techniques allow fast generation
of face images from freehand sketches. However, existing solutions tend to
overfit to sketches, thus requiring professional sketches or even edge maps
as input. To address this issue, our key idea is to implicitly model the shape
space of plausible face images and synthesize a face image in this space to
approximate an input sketch. We take a local-to-global approach. We first learn feature embeddings of key face components, and push corresponding
parts of input sketches towards underlying component manifolds defined
by the feature vectors of face component samples. We also propose another
deep neural network to learn the mapping from the embedded component
features to realistic images with multi-channel feature maps as intermediate
results to improve the information flow. Our method essentially uses
input sketches as soft constraints and is thus able to produce high-quality
face images even from rough and/or incomplete sketches. Our tool is easy to
use even for non-artists, while still supporting fine-grained control of shape
details. Both qualitative and quantitative evaluations show the superior generation
ability of our system to existing and alternative solutions. The usability
and expressiveness of our system are confirmed by a user study.
收起
摘要 :
Generative Adversarial Networks (GANs) have achieved inspiring performance in both un-supervised image generation and conditional cross-modal image translation. However, how to generate quality images at an affordable cost is stil...
展开
Generative Adversarial Networks (GANs) have achieved inspiring performance in both un-supervised image generation and conditional cross-modal image translation. However, how to generate quality images at an affordable cost is still challenging. We argue that it is the vast number of easy examples that disturb training of GANs, and propose to address this problem by down-weighting losses assigned to easy examples. Our novel Incremental Focal Loss (IFL) progressively focuses training on hard examples and prevents easy examples from overwhelming the generator and discriminator during training. In addition, we propose an enhanced self-attention (ESA) mechanism to boost the representational capacity of the generator. We apply IFL and ESA to a number of unsupervised and conditional GANs, and conduct experiments on various tasks, including face photo-sketch synthesis, map⇔aerial-photo translation, single image super-resolution reconstruction, and image generation on CelebA, LSUN, and CIFAR-10. Results show that IFL boosts learning of GANs over existing loss functions. Besides, both IFL and ESA make GANs produce quality images with realistic details in all these tasks, even when no task adaptation is involved.
收起
摘要 :
Image generation has received increasing attention because of its wide application in security and entertainment. Sketch-based face generation brings more fun and better quality of image generation due to supervised interaction. H...
展开
Image generation has received increasing attention because of its wide application in security and entertainment. Sketch-based face generation brings more fun and better quality of image generation due to supervised interaction. However, when a sketch poorly aligned with the true face is given as input, existing supervised image-to-image translation methods often cannot generate acceptable photo-realistic face images. To address this problem, in this paper we propose Cali-Sketch, a human-like-sketch to photo realistic-image generation method. Cali-Sketch explicitly models stroke calibration and image generation using two constituent networks: a Stroke Calibration Network (SCN), which calibrates strokes of facial features and enriches facial details while preserving the original intent features; and an Image Synthesis Network (ISN), which translates the calibrated and enriched sketches to photo-realistic face images. In this way, we manage to decouple a difficult cross-domain translation problem into two easier steps. Extensive experiments verify that the face photos generated by Cali-Sketch are both photo-realistic and faithful to the input sketches, compared with state-of-the-art methods. (c) 2021 Elsevier B.V. All rights reserved.
收起
摘要 :
Sketching has become fashionable with the increasing availability of touch-screens on portable devices. It is typically used for rendering the visual world, automatic sketch style recognition and abstraction, sketch-based image re...
展开
Sketching has become fashionable with the increasing availability of touch-screens on portable devices. It is typically used for rendering the visual world, automatic sketch style recognition and abstraction, sketch-based image retrieval (SBIR), and sketch-based perceptual grouping. How to automatically generate a sketch from a real image remains an open question. We propose a convolutional neural network-based model, named SG-Net, to generate sketches from natural images. SG-Net is trained to learn the relationship between images and sketches and thus makes full use of edge information to generate a rough sketch. Then, mathematical morphology is further utilized as a postprocess to eliminate the redundant artifacts in the generated sketches. In addition, in order to increase the diversity of generated sketches, we introduce thin plate splines to generate more sketches with different styles. We evaluate the proposed method of sketch generation both quantitatively and qualitatively on the challenging dataset. Our approach achieves superior performance to the established methods. Moreover, we conduct extensive experiments on the SBIR task. The experimental results on the Flickr15k dataset demonstrate that our proposed method leverages the retrieval performance compared with the state-of-the-art methods. (C) 2018 SPIE and IS&T
收起
摘要 :
Recent deep generative models allow real-time generation of hair images
from sketch inputs. Existing solutions often require a user-provided binary
mask to specify a target hair shape. This not only costs users extra labor but
...
展开
Recent deep generative models allow real-time generation of hair images
from sketch inputs. Existing solutions often require a user-provided binary
mask to specify a target hair shape. This not only costs users extra labor but
also fails to capture complicated hair boundaries. Those solutions usually
encode hair structures via orientation maps, which, however, are not very effective
to encode complex structures. We observe that colored hair sketches
already implicitly define target hair shapes as well as hair appearance and
are more flexible to depict hair structures than orientation maps. Based on
these observations, we present SketchHairSalon, a two-stage framework for
generating realistic hair images directly from freehand sketches depicting desired hair structure and appearance. At the first stage, we train a network
to predict a hair matte from an input hair sketch, with an optional set of
non-hair strokes. At the second stage, another network is trained to synthesize
the structure and appearance of hair images from the input sketch
and the generated matte. To make the networks in the two stages aware of
long-term dependency of strokes, we apply self-attention modules to them.
To train these networks, we present a new dataset containing thousands
of annotated hair sketch-image pairs and corresponding hair mattes. Two
efficient methods for sketch completion are proposed to automatically complete
repetitive braided parts and hair strokes, respectively, thus reducing
the workload of users. Based on the trained networks and the two sketch
completion strategies, we build an intuitive interface to allow even novice
users to design visually pleasing hair images exhibiting various hair structures
and appearance via freehand sketches. The qualitative and quantitative
evaluations show the advantages of the proposed system over the existing
or alternative solutions.
收起
摘要 :
A B S T R A C T Face photo-sketch synthesis aims to generate face sketches from real photos and vice versa. It can be abstracted as a constrained quantization problem. Although many effort s have been dedicated to this problem, it...
展开
A B S T R A C T Face photo-sketch synthesis aims to generate face sketches from real photos and vice versa. It can be abstracted as a constrained quantization problem. Although many effort s have been dedicated to this problem, it is still a challenging task to synthesize detail-preserving photos or sketches due to the significant differences between face sketch (drawn by people) and photo (taken by cameras) domains. In this paper, we propose a novel Identity-sensitive Generative Adversarial Network (IsGAN) to address it. Our key insight is to formalize face photo-sketch synthesis as a special case of image-to-image translation and propose to embed identity information through adversarial learning. In particular, an adversarial architecture is used to capture the differences between the two domains, and a new network loss, namely, identity recognition loss is introduced to preserve the detailed identifiable information, which is crucial for photo-sketch synthesis. In addition, to enforce structural consistency during generation, a cyclic-synthesized loss is applied between the generated image of one domain and cycled image of another. The experiments on the CUFS and CUFSF datasets suggest that our model achieves state-of-the-art performance in both qualitative and quantitative measures. (c) 2021 Published by Elsevier Ltd.
收起
摘要 :
Sketch-to-image synthesis is a challenging task in the field of computer vision that generates photo-realistic images from given sketches. Existing methods of this kind are unable to discover the inherent semantic information cont...
展开
Sketch-to-image synthesis is a challenging task in the field of computer vision that generates photo-realistic images from given sketches. Existing methods of this kind are unable to discover the inherent semantic information contained in an image and use it to guide the synthesis process, substantially reduce their capacity to generate photo-realistic images. Accordingly, in this paper, we propose a novel framework that explores and leverages semantic information to generate realistic textures in synthesized images for this task. More specifically, the segmentation maps generation network is designed to learn the relationships between sketches and segmentation maps in order to obtain the semantic segmentation maps from the sketches. Taking semantic segmentation maps as the condition, a feature-wise affine transformation is then executed to change the feature maps of intermediate layers in the network, which can efficiently generate the texture required to synthesize more photo-realistic images. Extensive experiments demonstrate that when compared to other state-of-the-art sketch-to-image synthesis methods, our approach can not only synthesize images with significantly superior visual quality but is also able to achieve better results on quantitative metrics. (C) 2021 Elsevier B.V. All rights reserved.
收起